Skip to content

Restrict label demotion to chemicals only#725

Draft
gaurav wants to merge 1 commit intoadd-pipeline-tests-for-shared-identifiersfrom
fix-label-limit
Draft

Restrict label demotion to chemicals only#725
gaurav wants to merge 1 commit intoadd-pipeline-tests-for-shared-identifiersfrom
fix-label-limit

Conversation

@gaurav
Copy link
Copy Markdown
Collaborator

@gaurav gaurav commented Apr 18, 2026

The previous config had a single global demote_labels_longer_than: 25 that applied to every Biolink type. This caused legitimate disease and phenotype labels like "postural orthostatic tachycardia syndrome" and "Failure to thrive" to be dropped in favour of shorter, less informative alternatives from UMLS.

Change demote_labels_longer_than to a per-type dict (same pattern as preferred_name_boost_prefixes). Only biolink:ChemicalEntity: 25 is set, so demotion now applies only to chemicals and their subtypes via ancestor traversal. Types with no entry are never demoted.

Extract the inline label-selection block from write_compendium() into a standalone _select_preferred_label() helper and add unit tests in tests/babel_utils/test_write_compendia.py with regression cases from the linked issues.

Fixes #597, fixes #711, fixes #714, fixes #723

The previous config had a single global `demote_labels_longer_than: 25` that
applied to every Biolink type. This caused legitimate disease and phenotype
labels like "postural orthostatic tachycardia syndrome" and "Failure to thrive"
to be dropped in favour of shorter, less informative alternatives from UMLS.

Change `demote_labels_longer_than` to a per-type dict (same pattern as
`preferred_name_boost_prefixes`). Only `biolink:ChemicalEntity: 25` is set, so
demotion now applies only to chemicals and their subtypes via ancestor traversal.
Types with no entry are never demoted.

Extract the inline label-selection block from `write_compendium()` into a
standalone `_select_preferred_label()` helper and add unit tests in
`tests/babel_utils/test_write_compendia.py` with regression cases from the
linked issues.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
gaurav added a commit that referenced this pull request Apr 18, 2026
Drives write_compendium() against curated cliques and asserts on the
JSONL output's preferred_name. Covers the regressions fixed in PR #725
(#711, #714, #723) and the chemical demotion path. Tests run offline
by patching bmt.Toolkit to read pinned local Biolink Model files; a
network freshness test fails loudly when the fixture and config.yaml's
biolink_version drift apart.

Co-Authored-By: Claude Opus 4.7 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant